Percolation in Directed Scale-Free Networks 



O 

o 

(N 
< 



N. Schwartz 1 , R. Cohen 1 , D. ben-Avraham 2 , A.-L. Barabasi 3 and S. Havlin 1 

1 Minerva Center and Department of Physics, Bar-Ilan University, Ramat-Gan, Israel 
2 Department of Physics, Clarkson University, Potsdam NY 13699-5820, USA 
3 Department of Physics, University of Notre Dame, Notre Dame IN 46556, USA 

Many complex networks in nature have directed links, a property that affects the network's 
navigability and large-scale topology. Here we study the percolation properties of such directed 
scale- free networks with correlated in and out degree distributions. We derive a phase diagram that 
indicates the existence of three regimes, determined by the values of the degree exponents. In the 
first regime we regain the known directed percolation mean field exponents. In contrast, the second 
and third regimes are characterized by anomalous exponents, which we calculate analytically. In the 
third regime the network is resilient to random dilution, i.e., the percolation threshold is p c — > 1. 
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Recently the topological properties of large complex 
networks such as the Internet, WWW, electric power 
grid, cellular and social networks have drawn consider- 
able attention 0J|] . Some of these networks are directed, 
for example, in social and economical networks if node A 
gains information or acquires physical goods from node 
B, it does not necessarily mean that node B gets similar 
input from node A. Likewise, most metabolic reactions 
are one-directional, thus changes in the concentration of 
molecule A affect the concentration of its product B, but 
the reverse is not true. Despite the directedness of many 
real networks, the modeling literature, with few notable 
exceptions has focused mainly on undirected net- 

works. 

An important property of directed networks can be 
captured by studying their degree distribution, P(J,k), 
or the probability that an arbitrary node has j incom- 
ing and k outgoing edges. Many naturally occurring di- 
rected networks, such as the WWW, metabolic networks, 
citation networks, etc., exhibit a power-law, or scale-free 
degree distribution for the incoming or outgoing links: 



Pin(out){l) =cr A *»<°"*>, l>m 



(1) 



where m is the minimal connectivity (usually taken to 
be m — 1), c is a normalization factor and \i n (out) 
are the in(out) degree exponents characterizing the net- 
work An important property of scale-free net- 
works is their robustness to random failures, coupled with 
an increased vulnerability to attacks Recently it 
has been recognized that this feature can be addressed 
analytically in quantitative terms by combining 
graph theoretical concepts with ideas from percolation 
theory jl3|— - Yet, while the percolative properties of 
undirected networks are much studied, little is known 
about the effect of node failure in directed networks. As 
many important networks are directed, it is important to 
fully understand the implications to their stability. Here 
we show that directedness has a strong impact on the 
percolation properties of complex networks and we draw 
a detailed phase diagram. 



The structure of a directed graph has been character- 
ized in H,[§, and in the context of the WWW in M. 
In general, a directed graph consists of a giant weakly 
connected component (GWCC) and several finite com- 
ponents. In the GWCC every site is reachable from 
every other, provided that the links are treated as bi- 
directional. The GWCC is further divided into a giant 
strongly connected component (GSCC), consisting of all 
sites reachable from each other following directed links. 
All the sites reachable from the GSCC are referred to as 
the giant OUT component, and the sites from which the 
GSCC is reachable are referred to as the giant IN compo- 
nent. The GSCC is the intersection of the IN and OUT 
components. All sites in the GWCC, but not in the IN 
and OUT components are referred to as the "tendrils" 
(see Fig. §). 
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FIG. 1. Structure of a general directed graph. 

For a directed random network of arbitrary degree dis- 
tribution the condition for the existence of a giant com- 
ponent can be deduced in a manner similar to Q. If a 
site, b, is reached following a link pointing to it from site 
a, then it must have at least one outgoing link, on av- 
erage, in order to be part of a giant component. This 
condition can be written as 



(k b \a -> b) = ^ k b P(j b ,k b \a 



b) = l, 



(2) 



where j and k are the in- and out- degrees, respectively, 
P(jb, k b \a — > b) is the conditional probability given that 
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site a has a link leading to b, and (kb\a —> b) is the con- 
ditional average. Using Bayes rule we get 

P{j b , k b \a -» b) = P{j b , k b , a b)/P{a -► 6) 

= P(a -> 6|j6, fcb)P(i6, fc 6 )/P(a -► 6). 

For random networks P(a —> b) = (k)/(N — 1) and 
P(a — » fcb) = jb/(N — 1), where AT is the total num- 
ber of nodes in the network. The above criterion thus 
reduces to (|,[|] 

(jk) > (k). (3) 

Suppose a fraction p of the nodes is removed from the 
network. (Alternatively, a fraction q = 1 — p of the nodes 
is retained.) The original degree distribution, P(J,k), 
becomes 

oo 

p'(j,k) = J2 P(j ,k )( JO )(i -pVK- 1 

x( fc ;)(i-p)y°- fe . (4) 

In view of this new distribution, Eq. (|3|) yields the per- 
colation threshold 

*c = l-Pc = ^, (5) 

where averages are computed with respect to the origi- 
nal distribution before dilution, P(j, k). Eq. (||) indicates 
that in directed scale-free networks if (jk) diverges then 
q c — > and the network is resilient to random breakdown 
of nodes and bonds. 

The term (jk) may be dramatically influenced by 
the appearance of correlations between the in- and out- 
degrees of the nodes. In particular, let us consider scale- 
free distributions for both the in- and o?rf-degrees: 

^n(j)~{f!"^p" ^tl] ( fi ) 

and 

Pout{k) =c oui fc- Ao « t . (7) 

In (|^) we choose to add the possible zero value to the 
m-degree in order to maintain (j) — (k). If the in- and 
out-degrees are uncorrelated, we expect (jk) = (j)(k). 
For several real directed networks this equality does not 
hold. For example, the network of Notre-Dame Uni- 
versity WWW §], has (k) = (j) « 4.6, and thus 
(j)(k) = 21.16. In contrast, measuring directly we find 
(jk) w 200, about an order of magnitude larger than the 
result expected for the uncorrelated case. This yields an 
estimate of q c w 0.02, i.e., a very stable directed network. 
We obtained similar results also for some metabolic net- 
works Jl6| ], indicating that in real directed networks, the 
in- and ouf-degrees are correlated. 



To address correlations, we model it in the following 
manner: we first generate the j values for the entire net- 
work. Next, for each site with j ^ with probability 
A we generate k fully correlated with j, i.e., k = k(j). 
Assuming that k(J) is a monotonically increasing func- 
tion then the requirement c ou tk~ Xout dk = c; Ln j~ Xin dj - 
needed to maintain the distributions scale-free — leads 
to k Xout ^ 1 — j A "> _1 . With probability 1 — A, the degree 
k is chosen independently from j: 

( (1 - A)Bc in j- x -c out k- x °^ 
P(j,k)~ l+BAc out k- x ^5 Km i^O, (8) 

[ (1 - B)c out k- x °^ j = 0, 

where j(k) = k . With this distribution, any finite 
fraction BA of fully correlated sites yields a diverging 
(jk) whenever 

(Xout -2)(A in -2) < 1 , (9) 
causing the percolation threshold to vanish (see Fig. 0.). 
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FIG. 2. Phase diagram of the different regimes for the IN 
component of scale- free correlated directed networks. The 
boundary between Resilient and Anomalous exponents is de- 
rived from Eq. (^) while that between Anomalous exponents 
and Mean field exponents is given by Eq. ( pi| ) for A* = 4. 
For the diagram of the OUT component \ in and \ ou t change 
roles. 

In the case of no correlations between the in- and 
the out-degrees, A = 0, Eq. (||) becomes P(j, k) = 
Pin(j)Pout(k). Then the condition for the existence of 
a giant component is: (k) = (j) = 1. Moreover, Eq. (|5|) 
reduces to: 

* = !-»,= -L. (10) 

Applying (|l(]) to scale-free networks one concludes that 
for \ ou t > 2 and A, n > 2 a phase transition exists at 
a finite q c . Here we concern ourselves with the critical 
exponents associated with the percolation transition in 
scale-free network of X ou t > 2 and Xi n > 2 which is the 
most relevant regime (Fig. ||). 

Percolation of the GWCC can be seen to be similar to 
percolation in the non-directed graph created from the 
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directed graph by ignoring the directionality of the links. 
The threshold is obtained from the criterion || 



(k) 



(k(k-i)y 



(ii) 



Here the connectivity distribution is the convolution of 
the in and out distributions 



p'(k) = j2p(i,k-i). 



(12) 



Regardless of correlations, P'(k) is always dominated 
by the slower decay-exponent, therefore percolation of 
the GWCC is the same as in non-directed scale-free net- 
works, with \ e ff = min(Xi n , X out ) . Note that the per- 
colation threshold of the GWCC may differ from that of 
the GSCC and the IN and OUT components @. 

We now use the formalism of generating func- 
tions to analyze percolation of the GSCC and IN 
and OUT components. In ||]4| a generating function is 
built for the joint probability distribution of outgoing and 
incoming degrees, before dilution: 



(13) 



Using the approach of Callaway et al [|0| let q(j, k) be 
the probability that a vertex of degree (J, k) remains in 
the network following dilution. The generating function 
after dilution is then 



G ( x > y) = ^2 P U> k )vti> k ) xJ n 



(14) 



From ( |l4| ) it is possible to define the generating function 
for the outgoing degrees Go 



G (y) = G(l,y) = ^ P(j, % fe 



(15) 



k,j 



The probability of reaching a site by following a specific 
link is proportional to jP(j, k), therefore, the probability 
to reach an occupied site following a specific directed link 
is generated by 



Gx(y) 



(16) 



Let Hi (y) be the generating function for the probabil- 
ity of reaching an outgoing component of a given size by 
following a directed link, after a dilution. Hi(y) satisfies 
the self-consistent equation: 



H 1 {y) = l-G 1 {l)+yG 1 {H 1 {y)) 



(17) 



Since Go(y) is the generating function for the outgoing 
degree of a site, the generating function for the probabil- 
ity that n sites are reachable from a given site is 



H (y) = l-Go(l)+yG (H 1 (y)) 



(18) 



For the case where correlations exist, and assuming ran- 
dom dilution: q(j, k) = q, Eqs. ( |l7| ) and ( ]l8] ) reduce to 

H 1 (y) = l-q 

+ "§ E^'( fc ) + C 1 - A)(j))P out (k)H 1 (y) k , (19) 



and 



H (y) = l-q + qy^Poudm^yf 



(20) 



If A — ► 0, one expects that H$(y) — H±(y), since there is 
no correlation between j and k, thus the probability to 
have k outgoing edges is P ou t(k) whether we choose the 
site randomly or weighted by the incoming edges j. 

Hq(1) is the probability to reach an outgoing compo- 
nent of any finite size choosing a site. Thus, below the 
percolation transition -ffo(l) = 1, while above the transi- 
tion there is a finite probability to follow a directed link 
to a site which is a root of an infinite outgoing compo- 
nent: Poo = 1 — iZo(l)- It follows that 



Pooiq) =q(l-J2 P out(k)u k ) 



where u = -Hi(l) is the smallest positive root of 



(21) 



u — 1 — q 



(1 - A)(j))P out (k)u k 



(22) 



Here P<x>(q) is the fraction of sites from which an infi- 
nite number of sites is reachable. Eq. (22) can be solved 



numerically and the solution may be substituted into 
Eq. (|2l]), yielding the size of the IN component at di- 
lution p = 1 — q. 

Near criticality, the probability to start from a site and 
reach a giant outgoing component follows Poo ~ (q — q c Y 3 ■ 
For mean-field systems (such as infinite-dimensional sys- 
tems, random graphs and Cayley trees) it is known that 
(3 = 1 (T9). This regular mean- field result is not always 
valid. Instead, following pQ] we study the behavior of 
Eq. (E2j) near q = q c , u = 1, and find 




where 



X* = X n 



2 < A* < 3, 

3 < A* < 4, 
A* > 4, 



1 



(23) 



(24) 



We see that the order parameter exponent (3 attains its 
usual mean-field value only for A* > 4. As X out — > A,;„ 
the correlated fraction BA of sites resembles non-directed 
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networks |2(J;|2l]] (where there is no distinction between 
incoming and outgoing degrees). In this case we get 
A* = Xout = Xi n for any amount of correlation A. The 
criterion for the existence of a giant component is then 
(k 2 ) I (k) = 1, and not 2 as in the non-directed case. The 
difference stems from the fact that in the non-directed 
case one of the links is used to reach the site, while in 
the directed case there is generally no correlation between 
the location of the incoming and outgoing links. There- 
fore, one more outgoing link is available for leaving the 
site. 

Without any correlations, A = 0, different terms pre- 
vail in the analysis and 







2 < X ou t < 3, 
Xout > 3. 



(25) 



is found between non-directed and directed scale-free 
percolation exponents for any finite correlation between 
the in- and oiz£-degrees. In the uncorrelated case, i.e. 
P(ji k) = Pin(j)Pout(k), the probability to reach an out- 
going component does not bear any dependence upon 
Pin(j)- The results are summarized in Table Q. 





uncorrelated 


correlated 


GWCC 


min(X out , Xin) + 1 


min(X ut, Xin) 


IN 


Xout + 1 




OUT 


Xin ~}~ 1 


\ i ^out — ^in 


GSCC 


min(X ut, Xin) + 1 


min(X out , X* n ) 



TABLE I. Values of A* for the different network compo- 
nents for both correlated and uncorrelated cases. 



This is the same as Eq. ( |23| ) but with A* = X ou t + 1- 

The GSCC is the intersection of the IN and OUT com- 
ponents. Therefore, it behaves as the smaller of the two 
components: Pascc = rnax(/3i n , f3 ou t)- This can be also 
derived by applying the same methods as for the IN and 
OUT components to the generating function of the GSCC 
obtained in [Q. The exponent for the GWCC, on the 
other hand, is independent of the exponents of the other 
components, since the transition point is different. 

It is known that for a random graph of arbitrary degree 
distribution the finite clusters follow the scaling form 



n(s) 



(26) 



where s is the cluster size and n(s) is the number of clus- 
ters of size s. At criticality s* ~ \q — q c \~ a diverges and 
the tail of the distribution follows a power law. 

The probability that s sites can be reached from a site 
by following links at criticality follows p(s) ~ s~ T , and is 
generated by H , where H (y) — J2^p( s )v s - ^ s m @l> 
H (y) can be expanded from Eq. (iq). In the presence 
of correlations we find 



2 < A* < 4, 
A* > 4. 



(27) 



The regular mean-field exponents are recovered for A* > 
4. For the uncorrelated case we get 



T = 



— 2 < A OM t < 3, 
Xout > 3. 



(28) 



Now the regular mean-field results are obtained for A > 3. 

In summary, we calculate the percolation properties of 
directed scale-free networks. We find that the percolation 
critical exponents in scale-free networks are strongly de- 
pendent upon the existence of correlations and upon the 
degree distribution exponents in the range of 2 < A* < 4. 
This regime characterizes most naturally occurring net- 
works, such as metabolic networks or the WWW. The 
regular mean-field behavior of percolation in infinite di- 
mensions is recovered only for A* > 4. A connection 
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